Remedies against the Vocabulary Gap in Information Retrieval
نویسنده
چکیده
Search engines rely heavily on term-based approaches that represent queries and documents as bags of words. Text---a document or a query---is represented by a bag of its words that ignores grammar and word order, but retains word frequency counts. When presented with a search query, the engine then ranks documents according to their relevance scores by computing, among other things, the matching degrees between query and document terms. While term-based approaches are intuitive and effective in practice, they are based on the hypothesis that documents that exactly contain the query terms are highly relevant regardless of query semantics. Inversely, term-based approaches assume documents that do not contain query terms as irrelevant. However, it is known that a high matching degree at the term level does not necessarily mean high relevance and, vice versa, documents that match null query terms may still be relevant. Consequently, there exists a vocabulary gap between queries and documents that occurs when both use different words to describe the same concepts. It is the alleviation of the effect brought forward by this vocabulary gap that is the topic of this dissertation. More specifically, we propose (1) methods to formulate an effective query from complex textual structures and (2) latent vector space models that circumvent the vocabulary gap in information retrieval.
منابع مشابه
Information Retrieval for Bridging Vocabulary Gap between Health Seekers and Providers
In this paper we describe how to bridge vocabulary gap between health seekers and providers using novel scheme. To code medical records by jointly using local mining and global mining. Local mining uses individual medical records to drive a conclusion about individual health map into the authenticated terminology. Global mining combines medical records of similar types and analysis it to drive ...
متن کاملFactors Affecting Student's Scientific Information Retrieval based on Fuzzy Logic Method Compared to Traditional Method
Background and aim: The aim of this study was to identify the factors affecting on students' performance in information retrieval based on fuzzy logic method compared to traditional method. Materials and methods: This survey-descriptive study was performed using quantitative approach. The research population was 34 PhD students, and the researcher-made questionnaire was used. Data were analyzed...
متن کاملStudying the Effect of Retrieval Direction during Reading on Productive and Receptive Knowledge of Vocabulary
Retrieval tasks provide learners with an opportunity to focus both on meaning and on form. There are four different retrieval directions. The present study aimed to identify the optimal direction of recall type retrievals during reading and to investigate the outcomes of each one. Forty-eight intermediate EFL learners took part in the study. One of the experimental groups was provided with the ...
متن کاملCombining Image Context Information
Current techniques for content based image retrieval have known shortcomings that make it difficult to search for images based on their semantic content. This leads to the well-known semantic gap problem. To address this problem, we propose utilizing context information, which is available from multiple sources, such as that generated by the camera at image capture, sensor data, context sources...
متن کاملClosing the Vocabulary Gap for Computing Text Similarity and Information Retrieval
This paper studies the integration of lexical semantic knowledge in two related semantic computing tasks: ad-hoc information retrieval and computing text similarity. For this purpose, we compare the performance of two algorithms: (i) using semantic relatedness, and (ii) using a conventional extended Boolean model [13] with additional query expansion. For the evaluation, we use two different tes...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1711.06004 شماره
صفحات -
تاریخ انتشار 2017